Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

نویسندگان

  • Napat Rujeerapaiboon
  • Kilian Schindler
  • Daniel Kuhn
  • Wolfram Wiesemann
چکیده

Plain vanilla K-means clustering is prone to produce unbalanced clusters and suffers from outlier sensitivity. To mitigate both shortcomings, we formulate a joint outlier-detection and clustering problem, which assigns a prescribed number of datapoints to an auxiliary outlier cluster and performs cardinality-constrained K-means clustering on the residual dataset. We cast this problem as a mixed-integer linear program (MILP) that admits tractable semidefinite and linear programming relaxations. We propose deterministic rounding schemes that transform the relaxed solutions to high quality solutions for the MILP. We prove that these solutions are optimal in the MILP if a cluster separation condition holds. To our best knowledge, we propose the first tractable solution scheme for the joint outlier-detection and clustering problem with optimality guarantees.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stock Portfolio-Optimization Model by Mean-Semi-Variance Approach Using of Firefly Algorithm and Imperialist Competitive Algorithm

Selecting approaches with appropriate accuracy and suitable speed for the purpose of making decision is one of the managers’ challenges. Also investing decision is one of the main decisions of managers and it can be referred to securities transaction in financial markets which is one of the investments approaches. When some assets and barriers of real world have been considered, optimization of...

متن کامل

Constrained K-means with General Pairwise and Cardinality Constraints

In this work, we study constrained clustering, where some constraints are utilized to guide the clustering process. In existing work on this topic, two main categories of constraints have been explored, namely pairwise and cardinality constraints. Pairwise constraints enforce that the cluster labels of two instances be the same (must-link constraints) or different (cannot-link constraints). Car...

متن کامل

Clustering with Outlier Removal

Cluster analysis and outlier detection are strongly coupled tasks in data mining area. Cluster structure can be easily destroyed by few outliers; on the contrary, the outliers are defined by the concept of cluster, which are recognized as the points belonging to none of the clusters. However, most existing studies handle them separately. In light of this, we consider the joint cluster analysis ...

متن کامل

Outlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means

One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...

متن کامل

The Partial Weighted Set Cover Problem with Applications to Outlier Detection and Clustering

We define the partial weighted set cover problem, a generic combinatorial optimization problem, that includes some classical data mining problems as special cases. We prove that it is computationally intractable and give a local search algorithm for this problem. As application examples, we then show how to translate clustering and outlier detection problems into this generic problem. Our exper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017